Unsupervised Learning of Morphological Forests

نویسندگان

  • Jiaming Luo
  • Karthik Narasimhan
  • Regina Barzilay
چکیده

This paper focuses on unsupervised modeling of morphological families, collectively comprising a forest over the language vocabulary. This formulation enables us to capture edgewise properties reflecting single-step morphological derivations, along with global distributional properties of the entire forest. These global properties constrain the size of the affix set and encourage formation of tight morphological families. The resulting objective is solved using Integer Linear Programming (ILP) paired with contrastive estimation. We train the model by alternating between optimizing the local log-linear model and the global ILP objective. We evaluate our system on three tasks: root detection, clustering of morphological families, and segmentation. Our experiments demonstrate that our model yields consistent gains in all three tasks compared with the best published results.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Imputation of Missing Values for Unsupervised Data Using the Proximity in Random Forests

This paper presents a new procedure that imputes missing values by random forests for unsupervised data. We found that it works pretty well compared with k-nearest neighbor (kNN) and rough imputations replacing the median of the variables. Moreover, this procedure can be expanded to semisupervised data sets. The rate of the correct classification is higher than that of other conventional method...

متن کامل

Bootstrapping Morphological Analysis of Gı̃kũyũ Using Unsupervised Maximum Entropy Learning

This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of shallow morphological features for the resourcescarce Bantu language of Gı̃kũyũ. This novel approach circumvents the limitations of typical unsupervised morphological induction methods that employ minimum-edit distance metrics to establish morphological similarity bet...

متن کامل

Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning

This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of shallow morphological features for the resourcescarce Bantu language of Gı̃kũyũ. This novel approach circumvents the limitations of typical unsupervised morphological induction methods that employ minimum-edit distance metrics to establish morphological similarity bet...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Unsupervised morphological parsing of Bengali

Unsupervised morphological analysis is the task of segmenting words into prefixes, suffixes and stems without prior knowledge of language-specific morphotactics and morpho-phonological rules. This paper introduces a simple, yet highly effective algorithm for unsupervised morphological learning for Bengali, an Indo-Aryan language that is highly inflectional in nature. When evaluated on a set of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • TACL

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2017